Skip to content

feat(fc): drain virtio-balloon free-page-hinting before pause#2552

Draft
ValentaTomas wants to merge 1 commit intofeat/uffd-fc-free-page-reporting-integrationfrom
feat/sandbox-pause-fph
Draft

feat(fc): drain virtio-balloon free-page-hinting before pause#2552
ValentaTomas wants to merge 1 commit intofeat/uffd-fc-free-page-reporting-integrationfrom
feat/sandbox-pause-fph

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented May 4, 2026

Drains virtio-balloon free-page-hinting before pause so snapshots don't capture pages the guest already considers free. The balloon (from parent FPR PR) always arms FreePageHinting=true; on pause we call start_balloon_hinting and poll describe_balloon_hinting until guest_cmd >= host_cmd (with host > 0 guard). Reclaimed pages emit UFFD_EVENT_REMOVE, already tracked by parent.

Gated by free-page-hinting-timeout-ms LD flag (ms; default 0 = disabled). Operator opts in once the kernel has the FPH race fix. Stacked on parent FPR branch for the shared balloon-install path; split out from #2550.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 4, 2026

PR Summary

Medium Risk
Touches the Firecracker VM lifecycle and snapshot/pause path and adds new polling logic against FC balloon APIs; although gated by a feature flag and designed to no-op when unsupported, regressions could impact snapshot latency or pause reliability.

Overview
Adds an optional pre-pause "balloon drain" that triggers virtio-balloon free-page-hinting and waits for guest acknowledgement so snapshots avoid capturing pages the guest already considers free, gated by the free-page-hinting-timeout-ms feature flag (default off) with an override exposed in resume-build. This also refactors balloon setup into an installBalloon call that can enable free-page-reporting and free-page-hinting independently based on Firecracker/kernel support, and extends version gating to detect the FC free-page-hinting API.

Reviewed by Cursor Bugbot for commit 7619cc9. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread packages/orchestrator/pkg/sandbox/fc/process.go Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit bf00edc. Configure here.

Comment thread packages/orchestrator/pkg/sandbox/fc/process.go
@ValentaTomas ValentaTomas force-pushed the feat/sandbox-pause-fph branch 3 times, most recently from 263a0d0 to f4e3ab0 Compare May 4, 2026 00:51
Arm free-page-hinting on the existing balloon device (always set when
the balloon is installed; pure runtime toggle), and on pause do a
host-initiated hint+wait so MADV_DONTNEED-reclaimed pages are settled
before the snapshot. Pages reclaimed this way generate UFFD_EVENT_REMOVE,
which the orchestrator already tracks (parent FPR PR), so the snapshot
captures them as removed instead of zero-filled.

- fc/client.go: rename enableFreePageReporting -> installBalloon;
  always set FreePageHinting=true; add startBalloonHinting +
  describeBalloonHinting helpers.
- fc/process.go: track balloonInstalled; add DrainBalloon (start +
  poll guest_cmd >= host_cmd, with host>0 guard against transient
  nil/zero responses).
- sandbox.go: wire featureFlags into Sandbox; call DrainBalloon from
  Pause behind the flag. Failures are logged but non-fatal.

Gated by free-page-hinting-timeout-ms (LD int flag, ms; default 0 =
disabled). resume-build gains --fph-timeout-ms for local exercise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants